Automatic Creation of Knowledge Graphs from Digital Musical Document Libraries
نویسندگان
چکیده
Most of the current musicological knowledge is present in printed books and manuscripts. In the last years greats efforts have been done in order to digitize and make available these documents in form of Digital Libraries. However, digital documents are mainly stored as raw text, with no more structure than indexes and some metadata. Therefore, implicit knowledge contained in text is not understandable by computers and cannot be processed like that. Automatic processing of text documents may help musicologists in several ways, such as improving navigation through a library, discovering hidden knowledge, accelerating tedious tasks, etc. To apply these techniques to a Digital Library, the information contained in documents should be carefully structured and semantically annotated. Information Extraction is a discipline of computer science focused on the extraction of structured information from unstructured text sources. We propose a method to automatically extract meaningful knowledge from documents present in Digital Musical Document Libraries, by using Information Extraction techniques. Our method has two main steps. First, relevant named entities (e.g. composers, organizations, places, etc.) are identified in the text. Second, words between these entities are syntactically and semantically analyzed to understand the relationship between them. Finally, the extracted knowledge is represented in a machine-readable format as a knowledge graph, where entities are represented as nodes, and relations as edges. The resulting knowledge representation is finally visualized as an interactive graph. With the proposed information visualization, users may go from one document to another by browsing the knowledge graph. We tested our method with a subset of artist biographies present in the Grove Music Online.
منابع مشابه
بررسی میزان رعایت معیارهای مدیریت دانش در وبسایتهای کتابخانههای دیجیتالی منتخب در ایران
Background and Aim: Considering the elements of knowledge management (availability, creation, and transfer of knowledge) is very important in digital libraries websites and makes the performance better. So this paper aim to identify the knowledge management criteria in Iranian selected digital library's websites and study of observance scale Materials and Methods: The research method was des...
متن کاملA Fuzzy Logic Based Expert System for Quality Assurance of Document Image Collections
Huge document image collections in digital libraries are prone to reduced quality and require automatic quality assurance. This paper presents an approach for bringing together information automatically aggregated from a quality assurance tool and expert knowledge related to digital preservation. The main contribution of this work is the definition of fuzzy expert rules and the application of f...
متن کاملExpanding a Humanities Digital Library: Musical References in Cervantes' Works
Digital libraries focused on developing humanities resources for both scholarly and popular audiences face the challenge of bringing together digital resources built by scholars from different disciplines and subsequently integrating and presenting them. This challenge becomes more acute as libraries grow, both in terms of size and organizational complexity, making the traditional humanities pr...
متن کاملInterfaces for Document Representation in Digital Music Libraries
Musical documents, that is, documents whose primary content is printed music, introduce interesting design challenges for presentation in an online environment. Considerations for the unique properties of printed msic, as well as users’ expected levels of comfort with these materials, present opportunities for developing a viewer specifically tailored to displaying musical documents. This paper...
متن کاملHeader Metadata Extraction from Semi-structured Documents Using Template Matching
With the recent proliferation of documents, automatic metadata extraction from document becomes an important task. In this paper, we propose a novel template matching based method for header metadata extraction form semi-structured documents stored in PDF. In our approach, templates are defined, and the document is considered as strings with format. Templates are used to guide finite state auto...
متن کامل